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ABSTRACT 

This article describes a new, fully adaptive Particle-Multiple-Mesh (PM^) numerical 
simulation code developed primarily for cosmological applications. The code integrates 
the equations of motion of a set of particles subject to their mutual gravitational inter- 
action and to an optional, arbitrary external field. The interactions between particles 
are computed using a hierarchy of nested grids constructed anew at each integration 
step to enhance the spatial resolution in high-density regions of interest. As the code 
is aimed at simulations of relatively small volumes of space (not much larger than a 
single group of galaxies) with independent control over the external tidal fields, signifi- 
cant effort has gone into supporting isolated boundary conditions at the top grid level. 
This makes our method also applicable to non-cosmological problems, at the cost of 
some complications which we discuss. We point out the implications of some differences 
between our approach and those of other authors of similar codes, in particular with 
respect to the handling of the interface between regions of different spatial resolution. 
We present a selection of tests performed to verify the correctness and performance 
of our implementation. The conclusion suggests possible further improvements in the 
areas of independent time steps and particle softening lengths. 

Subject headings: methods: numerical 

1. Introduction 

Ideally, a good cosmological A^-body simulation should resolve in some detail the internal 
structure of individual galaxies (on mass scales as low as 10^ to 10'' Mq and on length scales of 
a few kiloparsecs or less) while at the same time representing the growth of density perturbations 
on the scale of clusters and even superclusters of galaxies. Galaxy morphologies are observed to 
correlate with their membership in clusters. The Virgo cluster is estimated to be an appreciable. 
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if not dominant, source of tides on the Local Group 15 to 20 Mpc away. And when studying the 
nonhnear growth of galaxy-sized perturbations, one should not discount the possible coupling to 
modes with wavelengths as large as 50 Mpc ( Gelb &: Bertschinger 199^ ). 



Achieving the required dynamic range is a difficult task given the limitations of present-day 
computers. Generally, numerical codes have been able to obtain a large dynamic range either in 
mass or in length, but not in both. Particle-particle (PP) methods ( [Aarseth 1985 ) and tree codes 



(Appel 1981, 1985; [Barnes fc Hut 1986 ; Jernigan fc Porter 1989| ) have essentially unlimited length 



resolution, bounded only by the need to soften two-body encounters when modeling a collisionless 
physical system, but their use for simulations with more than a few tens, respectively hundreds, of 
thousands of particles is currently restricted to special-purpose hardware (such as GRAPE chips) or 
massively parallel computers ( Dubinski 1996| ). Particle-mesh (PM) methods ( [Hockney &z Eastwood 



1981, hereafter HE; Efstathiou et al. 1985| ), by contrast, can integrate the orbits of millions of 



interacting particles even on a more modest single-processor mainframe or high-end workstation. 
The forces are computed by solving Poisson's equation on a Cartesian grid. Barring complications 
such as a memory latency that increases with problem size, the computational cost per time step is 
linear in the number of particles and of order N log N in the number of grid cells (which is typically 
of the same order as the number of particles). The dynamic range in length, however, is limited by 
the maximum size of the grid, typically 256 or 512 cells on a side in three dimensions on a current 
single-processor computer. Parallel processing can naturally push this limit towards higher values. 

One can circumvent this obstacle by decomposing the inter-particle forces into a long-range 
part, calculated by the PM technique, and a small-range part which can be evaluated in a number 
of different ways. For some time, a popular approach has been to compute the short-range forces 
by direct summation over all sufficiently close pairs of particles. This technique, known as the 
particle-particle, particle- mesh (P^M) method (HE §8), was originally applied to plasma simula- 
tions where the electrostatic repulsion between charges of the same sign makes it difficult for large 
density contrasts to develop. Its use with gravitating systems suffers from the tendency for ever 
more particles to condense into small volumes, causing the cost of the PP summation to become 
prohibitive as the system evolves. Increasing the total number of particles and resolving smaller 
scales (on which density perturbations become nonlinear at earlier times, at least in the presently 
favored "bottom-up" scenario) both exacerbate the problem. We are unaware of any P'^M simula- 
tions using more than about 5 x 10^ particles. One may be able to raise this limit by using a tree 
method to compute the short-range forces, as in Xu's (1995) TPM, but in principle the difficulty 
remains. 

Most other attempts to enhance the spatial resolution of PM codes rely on the introduction 
of local grid refinements. This idea lies at the root of Couchman's (1991) adaptive P^M (AP^M) 
(which however still uses direct summation where the number of particles is small enough), of 



numerous hierarchical PM codes ( 


Chan et al. 1986; 


Villumsen 1989]; Bien, Fuchs, & Wielen 1991; 


Anninos, Norman & Clarke 199^^, hereafter ANC; Gelato, Chernoff, & Wasserman 1994 




Jessop, 


Duncan & Chau 1994 


, hereafter JDC; 


Sphnter 1996r 


and of various related approaches ( 


James & 
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Weeks 1986; [Suisalu k Saar 1995a|) . 



As pointed out by various authors, and most recently by Suisalu and Saar (1995b), the choice 
between PM, P^M, and tree codes is not only a question of the spatial and mass resolution that 
can be achieved, but also of the behavior of each method with respect to two-body gravitational 
collisions. These are to be avoided when the physical system being modeled is essentially collision- 
less, as is the case for example for cosmological dark matter. PM codes usually prove to be less 
collisional, although it should be emphasized that this is at least in part a consequence of their 
spatial resolution being weaker than their mass resolution, and need not carry over to hierarchical 
PM methods unless precautions are taken to ensure that the mass granularity remains adequate at 
all times. 

The purpose of this article is to document in some detail our implementation of a dynamically 
adaptive multiple-mesh code and the tests we performed to validate it. So far, only a brief report 
of an earlier stage of development ( Pelato, Chernoff, fc Wasserman 1994 ) and a description of 



the methodology but not of the tests (Gelato 1995) have appeared. Our code, like that of JDC, 



is able to track the formation and subsequent motion of density concentrations by dynamically 
adjusting the number, nesting depth and location of the subgrids. In our case, the entire grid 
structure is chosen afresh on every step. Since our first intended application was a cosmological 
problem — the formation of the Local Group — that requires the freedom to specify external tidal 
fields other than would be implied by periodic image charges, our code presents the relatively 
uncommon combination of an expanding system of coordinates with isolating (more properly non- 
periodic) boundary conditions. This introduces a number of complications not usually discussed 
in the literature on PM codes, and of which we shall give an account here. (Our method is also 
applicable to non-cosmological problems, for which isolating boundary conditions are an asset and 
the aforementioned complications do not arise.) A future article ( |Gelato, Chernoff, fc Wasserman 



1996) will detail the results of our simulations of group formation. 



The general outline of this paper is as follows. Section ^ describes in detail the method used. 
The tests and their results are presented in section ^. We conclude, in section ^, with our assessment 
of the strengths and limitations of this code. 



2. Description of the method 

2.1. Equations of motion 

Our code integrates the equations of motion for a set of gravitating particles. These equations 
can be derived from the Lagrangian 

^ = ^Yl^il^il"^ + ^YlJ2^i^jVi^ii^jii) + \-^Yl^i\^i\'^ -Y^mi^x{ri;t). (1) 

i i jy^i i i 
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Here the indices i and j span the set of particles in the system, rrii is the mass of particle i, rj(t) 
its position at time t, and V{r,r';t) describes the law of pairwise interaction between particles. 
Ideally V{r,r';t) = l/|r — r'| (we choose units such that G = 1) but the numerical method forces 
us to use an approximation which, due to adaptive grid refinement, can depend explicitly on time 
and on the individual coordinates r and r', not just on their difference. 

We could have included the term proportional to A (where A is an arbitrary constant) within 
the external potential ^x, but for cosmological applications it is useful to show it explicitly. 



2.2. Expanding coordinates 

It is often convenient to apply a time-dependent rescaling of the coordinate system, with new 
coordinates x defined by r = a(i)x. An equivalent Lagrangian is then 

£ = ^a^^milxjp + - ^"j ^mi|xjp + ^ ^ ^ mimjF(xi, x^; i) - ^ mi$2,(xi; i). (2) 

i i i jj^i i 

For notational convenience, we define 

H ^ (3) 

a 

3 /A d\ 

Note that in principle a{t) is arbitrary, reflecting our freedom to choose the coordinate system. For 
cosmological applications one normally chooses a{t) to be the solution of Friedman's equation for 
some cosmological model, in which case p{t) is the background density and H{t) the Hubble constant 
for that model. Later in this paper we shall impose the additional restriction pa^ = constant, which 
holds for cosmological models in the matter-dominated era. 

The second term in the Lagrangian (|2|) can be regarded as involving an effective potential 

^a(x;t) = -^a^lxp (I - ^) = -f paVP, (5) 

which satisfies V^$a = —Airpa'^. Similarly, in the continuum limit and for a Coulomb interaction 
(y(x, x';t) = a^^(t)|x — x'l"-*^), the third term satisfies V2'^j 'm,jV{xi,Xj;t) = — 47r/9(xj)a^, where 
the proper density p is given by p{x)a^^ = J2j''^j^i^ ~ ^j) (respectively V2) denotes 

differentiation with respect to the first (respectively the second) of the two position vectors on 
which V depends. 



The usual derivation of the equations of motion from the Lagrangian (|2|) yields: 

i^i + 2H±i = — pxj -Fa~2^mjViy(xi,Xj;t) - a'^V^xi^ut). (6) 

O ... 



- 5 - 



For later convenience, we define 

fi = — ^a^x, + a^mjViy(xi,Xj;t) -aV$,;(xi;t). (7) 

o . , . 

This allows the equation of motion (|6|) to be written more compactly as 

Xj + 2H±i = a'^ii. (8) 
It is also useful to recast the equation in terms of a generalized time coordinate r(t): 

d'"^' +2A{T)'^=B{r% (9) 



with 



«W ^ ^f^V. (H) 



A common choice is the so-called conformal time, dr = a~^dt; another ( Efstathiou et al. 1985| ) is 
T = a" for some constant a. Naturally, any sufficiently differentiable monotonic function of t is 
acceptable, and one is free to construct such a function ad hoc for each individual simulation. We 
avail ourselves of this freedom. 



2.3. The top grid 

We lay down a rectangular grid of by Ny by nodes with uniform spacings , hy and ■ 
The values of Nx, Ny and must be acceptable to the Fast Fourier Transform (FFT) routines 
used. (All our tests were done with A^^; = Ny = Nz a power of 2, and with = hy = h^, as this 
is the most common situation in practice and the only one we needed for our applications.) At 
every step, a density is assigned to each grid point using the cloud-in-cell (CIC) algorithm. FFTs 
are then used to solve for the potential on the same grid, and an acceleration is obtained for each 
particle by differentiating the grid potential, then using CIC interpolation. We use a two-point 
finite-difference formula to compute the derivatives of the potential at grid nodes. This choice 
means that truncation errors in the force law scale with the square of the grid spacing (HE, §5-4). 

The discrete Green's function we use is obtained by sampling l/|x — x'| at grid points (and 
Fourier transforming the result). This differs from the more common approach in cosmological 
PM codes ( Efstathiou et al. 1985| , Villumsen 1989| ) of basing the Green's function on the seven- 



point finite difference approximation to the Laplacian; our choice has the merit of guaranteeing the 
truncation of the interaction law at large separations, as required for a correct implementation of 
isolated boundary conditions ( [Eastwood &: Brownrigg 19791) . Unlike in the P^M method, where 
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the grid-based part of the force must possess a good degree of translational invariance to avoid 
introducing terms in the direct particle-particle contribution that depend explicitly on the position 
of each pair relative to the grid, we are not compelled to soften the interaction at small scales. 
To do so would cost us precious spatial resolution; already with our choice of Green's function 
the effective force turns out to be softened at separations ^2.5 grid cells. A consequence of our 
decision not to soften the force law any further is that the Q-minimization procedure of HE §8-3-3, 
which requires the reference force to have no harmonics on scales smaller than the grid spacing, is 
not appropriate for us. The choice of the shape of the interaction law at small separations and of 
the charge assignment and force interpolation scheme is a matter of compromise between accuracy 
and computational cost. Our choices could undoubtedly be improved upon, but we preferred to 
concentrate our efforts on the more innovative aspects of our method. 



2.4. Boundary conditions 

Isolated boundary conditions are implemented by conceptually doubling the size of the grid in 
each dimension, and padding the density array with an appropriate constant value (normally zero) 
outside the principal octant, which contains the particles (HE, §6-5-4). If the Green's function 
is truncated so that the interaction falls to zero at separations larger than the system (before 
doubling), this completely suppresses any interaction between the system and its periodic images. 
The computed potential, however, will only match that of an isolated system within the principal 
octant. Since we need to apply a gradient operator to the potential in order to compute the forces, 
there is a layer (one cell thick for our approximation to the gradient) in which we would not be 
able to compute them. We therefore allow no particles in this outer layer. (We are free to increase 
the thickness of this layer. It has been convenient to do so in some of the tests discussed below, to 
achieve an integral ratio of cell counts in comparisons between different grid resolutions.) We do 
not use James' (1977) method to impose isolated boundary conditions without doubling the grid 
by calculating appropriate screening charges on the surface, since strictly speaking the procedure 
is only justified when a finite-difference approximation to the Laplacian is used in solving Poisson's 
equation. The EFT technique does not satisfy this condition, and would require screening charges 
throughout the volume of the box. 

In cosmological applications, the space surrounding the system is not to be thought of as 
empty, but rather as containing a background of uniform density pb- (Departures from uniformity 
can be represented through an appropriate choice for the tidal potential ^x-) We must therefore 
add this background, convolved with the same (GIG) charge assignment function that is used for the 
particles. Its contribution to the density at any node on the boundary of the region where particles 
are allowed is then proportional to the number of cells adjacent to this node that lie outside the 
particle region: pb/2 for nodes on a face, 3pf,/4 on an edge, 7pb/8 at a vertex. 

We implement the (47r/3)px term of the equation of motion (equation ^) by subtracting p 
from the density at all points before solving for the potential. If p coincides with the background 
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cosmological density, the source function for the potential outside the principal octant is exactly 
zero. This is the most common case, and the one that presents the fewest conceptual and practical 
difficulties. With periodic boundary conditions it would also be the only possible case since the 
mean density outside the simulation box must always equal that inside it. With isolated conditions 
a mismatch is permissible, and could be used for example in applying an expanding or contracting 
grid to follow the evolution of an isolated system without a cosmological density background. The 
varying a{t) would then be adjusted to match the expansion or contraction of the simulated system, 
providing better resolution at lower cost during the collapse phase. 

A very significant difference between periodic and isolated boundary conditions is that the 
latter allow the exchange of mass, momentum, energy, angular momentum between the particles 
and the exterior. With periodic boundary conditions, there is effectively no exterior. Particle flows 
are a significant concern in cosmological applications, where matter may both leave and enter the 
computational box during the simulation. In a typical cosmological model, the mass variance is of 
order unity in a sphere of radius = Sh"^ Mpc. One expects particles at the edge of a sphere 
of radius rs to be displaced with respect to the center of mass of the sphere by about rg/S, a 
significant length when compared to the size of a group or cluster of galaxies. A simulation of 
characteristic size rg may reasonably be expected to exhibit a twofold increase or decrease in the 
total mass within the box during a run. On smaller scales the variance is even larger, at least for 
the currently fashionable "bottom-up" scenarios of structure formation. 

Exiting particles are easily handled by removing them from the simulation, but incoming 
particles have to be injected according to some prescription that fits the physical problem at hand. 
In the cosmological CclSGj clS long as the density perturbations grow linearly on the scale of the box, 
a reasonable prescription can be based on extrapolations from the linear theory. At late times, in 
the strongly non-linear regime, simply extrapolating from the linear solution causes large amounts 
of material to be injected which in reality would collapse into bound objects outside the box and 
remain outside. In other words, the Zel'dovich approximation is clearly inappropriate beyond the 
time at which caustics form. It would lead to a gross overestimate of the total mass flow into the box. 
An adequate model of the inflow in the nonlinear regime therefore requires an actual simulation of 
the mass flows in a larger region. This has become normal practice in simulations with tree codes, 
and hierarchical grid methods such as ours also lend themselves well to this approach. The main 
difficulty is that in order to keep the total number of particles manageable, one must essentially run 
a preliminary simulation simply to find out where the mass that fiows into the region of interest 
originated and sample it with a finer granularity in the initial conditions. This may lead to problems 
of contamination by more massive "background" particles if the small-scale structure that is not 
resolved by the preliminary simulation turns out to have a significant impact on the dynamics. 
Furthermore, the presence of particles of different masses in the system could lead to spurious mass 
segregation effects. For these reasons, we attempted to keep the overall size of the computational 
box as small as possible, handling the tidal fields on larger scales, as well as any mass fiows (as 
long as they remained moderate) as externally imposed boundary conditions. 
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This turned out not to be a particularly successful design choice, and we cannot recommend 
it to others. Injecting too many, or too few, particles can have a destabilizing effect. The first 
term on the right hand side of equation ^ has, in the usual case p > 0, the effect of accelerating 
particles towards the boundary of the box. This is balanced by the second term, which represents 
the attraction between the gravitating particles. If more particles leave the box than are injected, 
the first term will tend to dominate and cause even more particles to be ejected; conversely, if too 
many particles are added the material will tend to collapse towards the center. This may be taken 
to represent a physical effect: a void is expected to expand faster than the universal average, an 
over-density more slowly. But unless the algorithm for replenishing the box with particles is well 
thought out, a runaway instability may occur. In practice we have found it expedient to couple the 
injection of particles to the outflow so as to maintain a constant mass within the system. We then 
monitor the cumulative mass of particles so recycled and compare it with the total mass in the 
box. If the ratio becomes too large (greater than 10% or so), we take it as an indication that one 
needs to simulate a larger volume of space. As the volume that needs to be simulated grows larger, 
our original motivation for using isolated boundary conditions becomes weaker. It turns out that 
the method we described in this paragraph works much better if the system is located in a region 
of lower density than the cosmic mean, as the mass fraction that undergoes reinjection is smaller 
in that case. This allows us to simulate a number of interesting systems, but is not suitable for 
statistical studies involving a random selection of initial conditions. 

Momentum, angular momentum, and energy can be carried by the particles that flow in and 
out of the system and by direct interaction with the external potential $2^(x;t) and the uniform 
density we impose in the space that surrounds the particle region. The latter unfortunately has 
the cubic symmetry of the computational box rather than the more convenient spherical symmetry 
that would be required for an exact cancellation of the induced tides. Consequently, these tides are 
always present except in the pure non-cosmological case with a a constant, where the background 
density vanishes. Whether this is a serious drawback depends on the physical problem being 
studied. If necessary, one can enlarge the computational box (this may be required in any case to 
keep mass flows under check), include a compensating term in or both. 



2.5. Introducing subgrids 

The idea of adding subgrids in regions where better spatial resolution is required is not a new 
one. Past approaches have differed on whether to add mass resolution at the same time by splitting 
the particles into a larger number of less massive ones, on how to minimize errors at the interface 
between the finer and the coarser grid, and on whether the solution on the coarser grid should be 
modified to take into account the results from the finer grid. 
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2.5.1. Subgrids and particles 

Unlike Villumsen (1989) and Splinter (1996), we do not automatically introduce a new set 
of less massive particles on each subgrid. We take the view that our initial mass granularity 
already matches the resolution we wish to achieve, and simply increase the force resolution (by 
adding subgrids) when and where the particle density is high enough to make this permissible and 
worthwhile. This allows the decision of where to place subgrids to be made on a step by step basis, 
and spares us the need to perform a first simulation without subgrids to find out where particles 
of smaller mass should be placed in the initial conditions. An additional advantage of having a 
single set of particles of equal mass is that we need not worry about mass segregation effects. A 
drawback is that maintaining equivalent resolution within a larger computational volume requires 
more particles. Of course our code does support multiple particle masses, and as will become clear 
below the criteria for subgrid placement can be tuned to favor tracking the lower-mass particles; 
we may therefore decide to experiment with particles of different masses in future. 

In our scheme the various grids are merely devices, introduced independently on each integra- 
tion step, to compute inter-particle forces. In this respect we are closer in spirit to Couchman's 
AP^M than to most other multiple-grid approaches. Ours is effectively (in the terminology of ANC) 
a Particle-Multiple-Mesh (PM^) scheme. This distinction will have important consequences below. 
Since there is only one set of particles that exist independently of any subgrids, the equivalent of a 
back-reaction from the subgrid solution to the parent grid is automatically included: it is mediated 
by the particles themselves. 

We typically decrease the grid spacing by a factor of two for each additional level of subgrids. 
Each subgrid thus covers a substantial number of cells of its parent grid. Other integral refinement 
factors are allowed by our formulation (subject to the condition that the boundaries of any subgrid 
must coincide with cell boundaries of the parent grid) but become increasingly inefficient in terms of 
volume coverage and we have not tried them in practice. They would also exacerbate any "ringing" 
at the interface between a subgrid and its parent grid; since this is one of the most delicate aspects 
of any hierarchical grid scheme, it is probably not advisable to use refinement factors larger than 
two under any circumstances. 

A particle passing from a region of low resolution to one of higher resolution effectively under- 
goes a sudden change in its spatial extent. (In PM codes, particles are best thought of as extending 
over about one grid cell or more, depending on the shape of the interaction law and on the charge 
assignment scheme.) In its current state, our code takes no particular precautions against any tran- 
sients that may result. Our comparison tests between runs that use adaptive subgridding and runs 
with uniform resolution show that such transients are not important enough to produce significant 
discrepancies in the results. This is fortunate, since otherwise we would have had to associate a 
time-dependent smoothing length with every particle, which would complicate the method. That 
the smoothing length would have to be associated with the particle rather than with the location 
relative to a subgrid follows both from our choice to treat the particles as primary and the grids 



-10- 



as auxiliary objects and from the fact that new subgrids may be added, removed or relocated to 
follow the flow at every step. 



Our strategy for computing the forces between subgrid particles differs slightly from that of 
Villumsen. His approach was to recalculate the potential on the entire parent grid after having set 
the density in the volume occupied by the subgrid to a constant value equal to the mean density 
within the subgrid, and use this solution as the tidal field on the subgrid particles from the rest of 
the system. The forces between subgrid particles were then computed in the usual way by solving 
for the potential on the subgrid. Instead, wc first compute forces for all particles on the parent 
grid, then generate the density array on the subgrid (whose nodes must be aligned with those of 
the parent grid) as well as on a coarser grid that has the same spacing as the parent grid and 
nodes in the same locations but covers only the volume of the subgrid. Since coordinates on the 
subgrid are affected by the same expansion factor a(t) as on the parent grid, we must subtract the 
same p. We solve for the forces on all subgrid particles from both these density arrays, and add 
the difference to the forces we had previously calculated on the parent grid. This avoids double 
counting and means in effect that pairwise forces between particles that both lie in the subgrid 
are always evaluated at the higher of the two resolutions. The coarse FFT on the subgrid involves 
far fewer grid points, so our approach is more efficient than Villumsen's. It is also more flexible, 
in that we only need to store the accumulated force for every particle and the density potential 
for a single subgrid at a time, independently of the number and nesting depth of subgrids. The 
cost of actually storing a force vector for every particle may seem high when one knows that in 
a traditional PM code with leapfrog time stepping the particle accelerations can be interpolated 
when the velocities are updated and need not be kept afterwards; but saving them allows us to use 
them to adjust the time step and to compute subgrid corrections to the energy conservation test. 
On the other hand, we could have supported independent time steps had we chosen to save the 
potential on each subgrid instead. This, however, would have caused the storage requirements to 
scale with the subgrid nesting depth. 

Let S be the set of particles in the subgrid, C its complement. Schematically, the acceleration 
on particle i is computed as follows. For i e C, 



2.5.2. Forces on subgrids 




(12a) 



For i e S, 



F = ^ mjViVp(xj,Xj) 



+ 



mjViVp{-Ki,Xj) 




^mjViVp/(xi,Xj) 



(12b) 



jes 
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where Vp is the approximation to the Coulomb potential on the parent grid, the approximation 
on the subgrid, and Vp' that on the subset of the parent grid that covers the subgrid. (In reality, 
the sums are evaluated by FFT and include the p term.) Vp = Vpi to a very good degree, provided 
only that the grid nodes are in the same locations. (This is an essential requirement: we have found 
that violating it has deleterious effects on the solution.) The two rightmost terms in the display of 
equation ( 12b| ) therefore cancel each other. Note that we add together force contributions from the 
various grids rather than the potentials; this allows the arbitrary additive constant in the potential 
to differ from grid to grid without affecting our solution. 



2.5.3. Boundary conditions on subgrids 

We impose isolated boundary conditions on the subgrid, with an external background density 
(/o) equal to that used on the parent grid. It may seem more appropriate to use the mean density 
within the subgrid, or — better yet — the mean density in the immediate vicinity of the subgrid; but 
one should note that some ringing is to be expected no matter what the exact scheme adopted 
since the density at the boundary of the subgrid always has some short-wavelength component 
that is resolved on the subgrid but not on the parent grid. The usual approach in codes of this 
type is to try to limit the ringing by introducing around the subgrid a buffer region, in which 
particles contribute to the potential but are not affected by it, and/or a transition region in which 
the potentials of the parent grid and of the subgrid are blended linearly. The transition region 
implements the variable smoothing length we alluded to at the end of the previous subsection. As 
already noted, it is of limited usefulness in our case since it doesn't guard against transients induced 
by the sudden activation of a new subgrid. The possible motivation for introducing a buffer region 
can be understood by considering the case of a particle Pq located on the subgrid but near its edge, 
and subject to the forces of two nearby equidistant particles Ps, also on the subgrid, and Pp just 
outside it. In the exact solution, Ps and Pp exert forces of equal magnitude on Pq. Without a 
buffer region, the calculated force from Ps is stronger. This can create spurious small-wavelength 
perturbations as the pair (Pq, Ps) tends to collapse on its own rather than as a triplet {Ps, Pq, Pb)- 
The introduction of a buffer region would allow the balance to be preserved between the attractions 
of Ps and Pp. 

Our implementation of the buffer region is as follows. Extending the notation introduced in the 



previous subsection, we have split C into the buffer region B and a distant exterior X. Equation |l2b 
is modified in that the sum over j € C becomes a sum over j € X and the sums over j £ S become 
sums over j € (SUB). We normally set the thickness of this buffer region to be equivalent to three 
cells of the parent grid: the grid-based force does not differ significantly from the exact Coulomb 
force at separations this large, so that Vs ~Vp. Since the thickness of the buffer region is fixed in 
units of the spacing of the parent grid, large refinement factors result in the buffer region "eating 
away" most of the volume of the subgrid. This is one of the reasons why we only experimented 
with a refinement factor of 2 at each level. 
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2.5.4- Adjacent subgrids 

An essential feature of this buffer region is that it enables us to use multiple adjacent subgrids 
to cover extended structures that are worthy of higher resolution but do not fit within a single 
grid. This could occur either because of their size or because they are located too close to other 
high-density objects already covered by their own subgrids. (We do not allow arbitrary overlap 
of subgrids, for both simplicity and efficiency.) A problem that might arise when the boundary 
between two adjacent subgrids cuts through the middle of a high density structure is that the force 
between two nearby particles separated by the boundary would be calculated with the (poorer) 
resolution of the parent grid. Thanks to the buffer region around each subgrid, this difficulty is 
avoided: the force exerted by either particle on the other will be computed at subgrid resolution 
by virtue of each particle lying in the buffer region of the other's subgrid. 

In the overlap region between adjacent subgrids, it matters little whether the border region 
is implemented as described above or in the following alternative way, by subjecting particles in 
B to the subgrid forces without including them in the subgrid potential. This corresponds to 



applying equation |l2a| to particles i G X only, and equation |l2b| to particles i G {S U B). Our 
first implementation used the latter approach, and comparison tests show only minute differences 
between the results of both variants. 

A difficulty arises wherever the buffer region has finite width and does not overlap with an 
adjacent subgrid. Then the balancing of the forces by Ps and Pb over Pq comes at the cost of a 
violation of Newton's second law: the force exerted by Pq on Pb, being computed on the parent 
grid, does not balance the force by Pb on Pq. The net result is that Pb will be accelerated outwards, 
away from the subgrid. (If the alternative implementation of the buffer region is adopted, the net 
effect has opposite sign and tends to push the particles inwards.) The consequences for the orbit 
of a bound clump can be spectacular, as the following worst-case example, illustrated by figure |l|, 
shows. We launch a 4096-particle realization of a truncated isothermal sphere (with a half-mass 
radius of 0.02 units, corresponding roughly to half the cell width on the top grid) with a mean 
velocity of 0.1 units towards a region covered by a fixed subgrid (the inner bounds of which are 
indicated by dotted lines in the figure) with refinement factor of two and a buffer region of width 
(solid curve) or 2 (dashed curve) parent grid cells. For this example we adopted the alternative 
implementation of the buffer region, which causes the particles to be accelerated inwards. The 
initial velocity is along the x axis, perpendicular to the faces of the subgrid. The figure shows the 
X coordinate of the center of mass of the sphere as a function of time t. When the buffer region is 
suppressed (width 0, solid curve) the clump enters and exits the subgrid without changes to its bulk 
velocity. (The apparent turnaround occurs when the clump reaches the edge of the computational 
volume and some particles leave the box. The solid curve reflects the mean velocity of only those 
particles that remain in the box.) The presence of a finite buffer region, by contrast, causes the 
clump to accelerate as it enters the subgrid, and to bounce back as it tries to exit on the other 
side. The reason it doesn't simply lose the momentum it gained when entering the subgrid is that 
the momentum change depends on the difference between the forces calculated on the subgrid and 
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on the parent grid, and that the passage through the more highly resolved subgrid has allowed the 
clump core to relax to a more sharply peaked density profile, increasing the force mismatch. The 
main features of this behavior are quite generic: a more carefully constructed clump with a larger 
number of particles that was allowed to settle into a numerical equilibrium before being launched 
through the subgrid underwent similar, if slightly milder, accelerations and rebounds. 

Clearly it is better for us to suppress the buffer region where there is no adjacent subgrid. The 
violation of momentum conservation that a finite border region implies can have a drastic influence 
on the orbit of a bound clump. This would not be much of a concern if we could guarantee that 
bound clumps are always entirely covered by subgrids at the same resolution; the problem does not 



arise for smooth distributions of matter where short-range forces don't predominate. (In section 3^ 
we demonstrate this fact through a test of smooth infall onto a localized seed mass perturbation.) 
In principle, an adaptive algorithm for subgrid placement based on the density should tend to 
ensure this. However, one may want to place additional restrictions on which regions are followed 
at higher resolution, in which case the guarantee does not hold and we must take the precaution 
of suppressing the buffer region in the absence of an adjacent subgrid. Note that the problem is 
less severe when expanding coordinates are used, as in cosmological applications, since the peculiar 
velocity acquired while crossing a subgrid interface will be damped away. 

Why has this same difficulty not been recognized by other authors, notably ANC and Splinter 
(1996)? The reason may be traced to a fundamental difference in philosophy between our code 
and theirs. The very idea of a particle on a subgrid interacting symmetrically with a particle on 
the parent grid is a consequence of our decision to regard the particles, rather than the density 
field on the grids, as primary. In a scheme where every particle is associated with only one grid, 
and particularly when the dynamics of the parent grid particles are unaffected by the subgrid (the 
so-called "one-way interface" schemes), momentum conservation should be examined separately 
on each grid: on the parent grid, no violation is expected, while on the subgrid the effects of the 
parent grid particles are treated as an external field and the buffer and transition regions are used to 
smoothly bring any particles exiting the subgrid under the control of the parent grid field alone, as 
test particles, rather than reflecting them back into the subgrid. Thus, the apparent contradiction 
between these authors' use of a buffer region two cells wide and our restriction of the buffer region 
to the sole case of adjacent subgrids does not signal an error either on our part or on theirs, but is 
merely a manifestation of our different view of the respective roles of particles and grids. 



2.5.5. Subgrid placement algorithms 

A distinguishing feature of our code is that the activation of subgrids is entirely automated: the 
user need only set a few parameters (maximum depth, and various optional thresholds), after which 
the code decides at every step where to place subgrids. Tuning the criteria for subgrid placement 
is not an easy task; our current choices can undoubtedly be improved upon. Here we shall describe 
some of the criteria we have implemented. Not all of them have proved very useful in practice. 
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Our subgrids have fixed shape and size, and may not overlap. Consequently, we found it 
simplest to introduce a uniform tihng of subgrids covering the entire volume of the parent grid. 
Each subgrid may be either active or inactive. A subgrid is activated whenever our requirements 
for subgridding are satisfied within its volume. We are free to choose the origin of the tiling 
independently for every parent grid at every step. We make use of this freedom to reduce the 
number of active subgrids. Typically we try eight different possible origins, and adopt one that 
requires the activation of fewest subgrids. 

The first criterion for subgrid activation is that the particle number density be sufficiently 
high in at least part of the covered region: there is no point in the grid ever being much finer 
than the smallest distance between particles. To activate a subgrid, we require that among the 
corresponding cells of the parent grid either (a) at least one cell contains more than Nq particles; 
or (b) one cell contains at least A'^i particles and its 26 immediate neighbors together contain a 
further A^3 — A^i particles or more. A^o^ and N3 arc specified as input parameters to the code. 

The main reason for considering cubes of 3 x 3 x 3 cells as well as single cells is that one needs 
to detect the collapse of a high-density region before it is too advanced: the additional resolution 
is already required during the later stages of collapse. We don't try to detect larger, lower-contrast 
potentially collapsing structures on larger scales since for separations larger than about three cells 
the forces are computed accurately on the parent grid. 

One way of choosing Ni and N3 is the following. Let N^s be a measure of the mean number of 
particles per cell in some larger region, such as the entire candidate subgrid. (This measure need 
not be unbiased; in fact, if the mean particle number density is very low we found it necessary 
to artificially increase N^s by setting it to the value 1. Otherwise our subgrid placement criterion 
would be satisfied far too easily in low-density regions.) If the particle counts were distributed 
according to Poisson statistics, one would expect a group of ric cells to contain on average ricNeS 
particles, with a standard deviation (ricNes)^^'^ around that mean. By setting Ni = NeS + N^N^g^ 
and Ns = S^N^s + ^^a(3^iVcff)^/^ where is a tunable parameter, we try to ensure that only 
particularly significant deviations from homogeneity trigger subgrid activation. 

We have found that this approach works reasonably well during the early stages of nonlinear 
evolution, where it causes higher resolution to be applied to the regions we are most interested 
in. However, as the simulation progresses wc have needed to increase N^- in order to limit the 
proliferation of subgrids, as our goal was merely to achieve high resolution in a few large peaks 
near the center of the computational box. Clearly the criteria can and should be tuned according 
to the needs of individual applications; no single choice is optimal for all cases. 
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2.6. Time step selection 

We follow the usual practice in PM codes of advancing the positions and velocities of the 
particles in leapfrog fashion: 

r/x- 

Xi(r + <5T) = xi(r)+<5r-^(r). (14) 

or 

An alternative formula applies on a starting step, when both the positions and the velocities are 
known at the same time r: 

x,(r + 5r/2) = x,(r) + — (^1 - ^ + — S(r)f,(r). (15) 

Likewise, one can synchronize positions and velocities at the final step by advancing the velocities 
from T — 6t/2 to t + 5t/2 then updating the positions as 

x.(r + .>/2) = ..M + |(l + :l(l)^l)^-^l!BMr.M. (16) 

In large simulations, it is particularly important to avoid taking more time steps than required 
to obtain accurate results. The time step should therefore be allowed to vary. In the standard 
leapfrog formulation, the cancellation of second-order terms in equation ( p!3[ ) depends on A being 
evaluated at the midpoint of the velocity step. The cancellation of the second-order term in the 
formula to update the positions is less important since the fj are known; however, it contributes 
to lowering the operation count for the method. Various approaches can be used to change the 
time step in the middle of a simulation. One is to use a different midpoint in equation ( p!3|) and 
calculate the corresponding A(t). The other, which is the one we adopted, is to keep 6t constant 
by an appropriate choice of the function T{t). We construct this function and its first and second 
derivatives, which are needed to evaluate A and B according to equations (^) and (p|), step by 
step as the simulation progresses. (In practice, we do not allow dr/dt to vary too quickly, since 
that tends to spoil the results. We shrink 6t by at most 25% on each step, and let it grow even 
more slowly. Situations in which this forces a larger 5t than desired have fortunately been very rare 
in our runs.) 

Our preferred criterion for determining the time step is the Courant condition: 

<5t<emin-^ (17) 

j \Xj\ 

where the index j spans the set of all particles, hj is the grid spacing of the finest grid used to 
compute the forces on particle j, and e < 0.5 is a constant coefficient. Xj should be taken as being 
the mean velocity over the time step, and is a function of both the velocity before the step is taken 
and of the acceleration. The latter can be important if the initial velocity is uncharacteristically 
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small, for example when all the particles are initially at rest. The equation for the maximum 

acceptable 5t for each particle is actually a quartic. We do not solve it exactly, preferring to 
estimate a close lower bound on the solution for the particles with the tightest 5t constraints. This 
way of taking the accelerations into account fulfills the same function as other authors' use of the 
maximum density on a subgrid to estimate a local dynamical time. 

In some cosmological simulations with a large number of particles, we found that adopting the 
smallest 5t found in this fashion was leading to time steps much shorter than warranted by our 
physical understanding of the dynamical time scales involved (based on the maximum density) . This 
turned out to be due to a small number of high velocity particles: the tail of the velocity distribution 
produced by the violent relaxation of a newly collapsed object, which is naturally better sampled as 
the number of particles in the simulation increases. Since these particles represent a small fraction 
of the mass in the typically high-density regions in which they are found, their motion has almost 
no impact on the mean gravitational field, which evolves on much longer time scales. This led us to 
modify the time step criterion for these simulations by pretending that the velocity of each particle 
before the time step is negligible and basing the choice of St solely on the accelerations. This is 
not unreasonable for cosmological applications since in the linear regime the velocities and peculiar 
accelerations are related and in virialized clumps they are a better indication of the dynamical time 
scale than the velocity of the fastest particle (which is not the same as the local velocity dispersion). 
Moreover, expanding coordinates have the property that peculiar velocities are damped away and 
need to be continually regenerated by the forces. As a further precaution, we have analyzed the 
velocities of the particles at a late stage in a simulation run with this relaxed time step criterion 
and found, as intended, that the number of particles that violated the stricter Courant condition 
was small. The relaxed criterion is known to fail, however, for highly ordered collapse (such as that 
of a homogeneous sphere) . In that case the largest acceleration does not provide a good estimate of 
the time scale over which the mean field evolves: based as it is on the more extended distribution at 
the beginning of the step, it systematically overestimates the maximum allowable time increment. 
Accordingly, we only resort to the relaxed criterion, with some reluctance, for runs in expanding 
coordinates, with large numbers of particles, and where different mass shells collapse at different 
times. Our tests on smaller cosmological runs have shown the results to be substantially identical, 
but with the relaxed condition requiring a much smaller number of steps. 

We use a single value of 6t for all particles on all subgrids at any given step. A far better 
approach in principle, but much more complicated to implement, would be to adopt multiple time 
steps and to distinguish between the time steps used for advancing individual particles and those 
for updating the potentials on subgrids. Independent time steps for the particles would require a 
way of estimating the accelerations at positions and times slightly different from those for which 
the potential has been calculated. For this, it may be necessary to calculate the time derivative 
of the subgrid potential as well as the potential itself. In addition to the constraints resulting 
from the rate of change of the subgrid density, the frequency with which the subgrid potential 
is recalculated can also be affected by particles flowing out of the region in which forces can be 
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interpolated from the subgrid potential: in our scheme, such events change the mapping between 
particles and subgrids and require recalculation of all the subgrid potentials involved if the total 
momentum is to be conserved. These are our main reasons for adopting a single time step for all 
grids. The other main reason is that a single time step allows us to process subgrids one by one, 
accumulating the force contributions on each particle without needing to store the solutions for 
the potential on many nested subgrids simultaneously. Almost any other time step scheme requires 
these subgrid potentials to be available simultaneously. 

The coefficients in the leapfrog formulae (equations ^ and 14) can depend on time. For good 
accuracy, their variation in a time step must be small. This constraint is clearly unrelated to the 
grid spacing, and must be imposed separately. It is equivalent to the requirement that \H5t\ <^ 1. 



2.7. Conservation laws 

Conservation laws provide an important diagnostic of how accurately the equations of motion 
have been integrated in a given simulation. Good conservation of physical invariants is not a 
sufficient condition for a good integration, but it is a necessary one. 

In the absence of coordinate expansion, subgrids, and external fields, a PM code such as ours 
would be expected to conserve total momentum algebraically, energy a little less well, and angular 
momentum only in the limit where the interparticle separation is much larger than the grid spacing 
(HE §7-6). 



2.7.1. Energy 

Cosmic expansion causes the usual equation of conservation of the total energy, E = T + W = 
constant (where T is the kinetic, W the potential energy), to be replaced with the Layzer-Irvine 
( Irvine 1961| ; [Layzer 1963| ) equation, which can be derived from the well-known property of the 
Hamiltonian Ti: 

d-H d-H 

Computing the canonical momentum from equation |2|, 

dC 2 

ma Xj, (19) 



it follows in the usual way that 

n = ^ Pi • xj - £ 



j * i i jj^i i 
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We define the kinetic and potential energies as: 

K ^ a^T^V^ (21) 

1 2tt 
^ ^ --a^^mimjy(xi,Xj;t) - — a^a^mj|xjp. (22) 

i j^i i 

Then 

rL = a^T + a-^W + Y,mi^.,{^i-t). (23) 

i 

Our exclusion of the external potential from the definition of W is somewhat arbitrary. In 
practice, one wants to separate the terms associated with the mutual interactions of particles, 
which do measure the accuracy of the integrator, from the correction terms that merely account 
for known, external sources and sinks of energy. 

In applying equation one notes that in 

— = -Ha-^W + a-^— (24) 

the last term does not vanish if aV depends explicitly on time (as may occur in practice when 
a new subgrid is activated and the force resolution increases locally), or if a'^p is not constant. 
For simplicity, we shall assume a'^p = constant from now on, since that is the only case of actual 
practical interest to us. 

Straightforward algebra leads to the two equations: 

^ (a^T + W)+ Ha^T + a ^ mii, • V$^(xi; t) + ^ E E "^^^^^'^ = ^ 

^(^a^T + aW) -i^aVF + a^E^iiii- V$^(xi;i) + ^aEE"^i"^i^ = (26) 

i i jj^i 

which although physically equivalent are evaluated numerically in slightly different ways when a(t) 
is not constant. 

The corresponding conserved quantities are: 

C = a^T + W + J Ha^Tdt + J aY^mi±i-V<^xixi;t)dt 

i 

1 /■ daV , . , , 

+ 2 / Z^}^^i^j-Qf{^i^^j'^t)dt (27) 

C = a{a^T + W)- j HaWdt + j a'^^mi±i-V^x{->^t,t)dt 

i 

+ j a'Y^^mimj^^{yLi,y.j]t)dt (28) 
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In proper-length coordinates (a = 1), the first integral term (+ J Ha'^Tdt or — J HaW dt) 
vanishes and both criteria reduce to the usual law of conservation of energy. The last two terms 
represent respectively the work done by external fields and time dependence in the interaction 
law. One may question the appropriateness of including the latter effect, since its origin lies in the 
numerical method used rather than in the underlying physics. Shouldn't contributions from the 
d{aV) /dt term be counted as violations of energy conservation by the code? In reply we point out 
that much of that term may result from fluctuations in the zero level of the gravitational potential 
on subgrids, which we did not attempt to suppress since they have no bearing on the computed 
forces. And in any case the d{aV)/dt contribution should be evaluated so that its magnitude 
relative to the other terms may be assessed. 

It is unrealistic to expect a PM code to conserve C and C arbitrarily well when gravitational 
instability causes the individual terms {W, a^T) to grow by orders of magnitude. Like many other 
authors before us, we deem C and C adequately conserved if their variation is a small fraction 
(typically 1-3%) of the change in W or aW, respectively. 

In our definition of W (equation 22) we excluded the self-energy terms X)j "^i ^(^ii t). But 
the natural way of calculating the potential in a PM code effectively includes these terms (the 
restriction j ^ i can be ignored since the scheme avoids self- forces) ; we subtract them explicitly as 
part of our energy conservation test. 



2. 7. 2. Momentum 

We adopted a momentum-conserving scheme for calculating the forces. The total momentum 
should therefore be unaffected by particle-particle interactions on any given grid, and (given the 
precautions we have taken in our handling of the buffer region surrounding each subgrid) be only 
weakly modified by interactions across the boundary between adjacent subgrids. However, it is 
still affected by all other interactions between particles and external fields, including the constant- 
density background outside the computational box. (This field gives rise to forces only when isolated 
boundary conditions are used, which is the reason why it isn't normally discussed in the literature.) 

The canonical momentum pj of particle i satisfies 

Pi = = ^pa^miJCi -hmj^ Viy(xi,Xj;t) - rmV^xi'^ut). (29) 

Summation over all particles and time integration yield a conserved quantity 



Pc 



i i i jj^i 



In the special case when (i) has no spatial dependence, (ii) J2i i^^i^i = or p = 0, and (iii) 
Vi^ -|- 'V2V = 0, the total momentum is constant. If one's aim is to measure the deviation of the 
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results from those expected when \/iV + \/2V = (such a deviation can occur in the presence of 
subgrids), one should naturally omit the corresponding term from equation 

2.7.3. Angular momentum 
Likewise, the conservation of angular momentum = x pj amounts to the constancy of 
Jc = Y^3i- jY^^mimjXixViV{xi,Xj;t)dt + J Y^niiXi x V^oci^ut) dt. (31) 

Angular momentum is only conserved if (i) (xj x Vi +Xj x 'V2)V = and (ii) is a function of |x| 
and t alone. The exact Coulomb force law satisfies the first condition, but grid-based approximations 
to it don't. As in the analogous case for the momentum, if the aim is to measure deviations from the 
ideal behavior where (i) is satisfied, then one should compare the magnitude of the corresponding 
term to J2i Jj itself, and to the magnitude of the external torque term when present. 

3. Tests of the method 

3.1. Force errors and subgrids 

Our first tests are meant to see how accurately the code calculates the accelerations of particles 
given a known mass configuration. The simplest case is that of the forces due to a single point 
mass, which can be compared to the value predicted by Coulomb's law. Figure § illustrates this 
comparison. It shows the accelerations induced by a single particle at the center of a 32^ grid with 
isolated boundary conditions. A single 32^ subgrid, twice as fine as the top grid, was centered on 
the massive particle. Two sets of points are readily distinguished, corresponding to test particles 
inside and outside the subgrid. As expected, the acceleration is systematically underestimated since 
the effective potential is softer than that of a point mass. Also as expected, the subgrid extends the 
range over which the force is accurate to within a few percent; the accuracy threshold (a little over 
3% in this test) could be lowered by increasing the size of the subgrid, the spacing being equal. 

Our second test checks that the accelerations computed on a set of adjacent subgrids agree 
with those one would obtain on a single, larger grid with the same spacing as the subgrids. We 
compare forces on individual particles randomly chosen to sample a uniform distribution. The 
long-range forces, which are adequately resolved on the parent grid, tend to cancel, so that the 
system is dominated by short-range interactions that will be most affected by the grid refinement. 
Our reference forces are from a non-subgridded run with 56^ cells and 56^ particles. We compare 
these to those obtained from three different runs with only 28'^ cells on the parent grid, in which 
all possible first-level subgrids are activated. The only difference between the three runs is in the 
width of the buffer region where adjacent subgrids overlap. The results are illustrated in figure |^. 
The magnitude of the force difference is plotted against that of the reference force for a randomly 
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selected 1% of the particles (to avoid overcrowding the plot). Panels (a), (b), and (c) correspond 
respectively to a buffer width of 0, 1, and 2 parent grid cells. The solid diagonal line represents 
|5F|/|F| = 0.1; the last panel also contains a dotted line where |(5F|/|F| = 0.01. The results show 
that when the buffer is sufficiently wide (at least two cells; we like to use three cells in production 
runs with larger grids), the forces computed by the subgrid method are mostly within 1% of the 
forces from a single grid of equivalent resolution. Whenever the buffer region is suppressed, an 
appreciable fraction of the particles exhibits force errors larger than 10%. Here it is essential 
to understand what we mean by "errors" . Figure |3| only compares the forces computed using a 
set of abutting subgrids of equal resolution with those from a single, larger grid with the same 
spacing, and shows that some overlap is both required and sufficient to avoid loss of resolution at 
the boundary between such abutting subgrids. It does not compare forces computed at different 
grid spacings, and does not address the issue of convergence to a "perfect" solution in the limit of 
an infinitely fine grid. We know that at the interface between a subgrid and a coarser parent grid 
the accuracy of the forces within the outer two or three cell layers of the subgrid will be less than 
that of a uniform fine grid regardless of the thickness of the buffer region. (It will, however, be 
no worse than on the parent grid alone.) We compensate for this by making the subgrids cover a 
slightly larger volume than the region where the higher resolution is required. 



3.2. Collapse of a prolate ellipsoid 

The collapse of a homogeneous self-gravitating pressure-less ellipsoid initially at rest can be 
described by a small set of coupled ordinary differential equations for the lengths of the principal 



axes, which can be integrated in a straightforward way ( Lin, Mestel &: Shu 1965 ). This solution 
provides a convenient test of our code. The test corresponds to that of collapse to a sheet or filament 
in a cosmological code with periodic boundary conditions, but is more appropriate to our use of 
isolated boundary conditions. In particular, the plane-wave test of Efstathiou et al. (1985) is only 
easy to interpret and compare with a semi-analytic solution when periodic boundary conditions 



are used, and these lie outside the scope of the present article. Another test, in section 3^ below, 
shows the behavior of our code when integrating the collapse of a moving sheetlike ridge in the 
presence of cosmological expansion. 

We followed the collapse of a homogeneous ellipsoid with a 2 : 1 : 1 ratio between the lengths 
of the principal axes. The experiment was repeated for two different choices of the grid spacing in 
order to verify convergence towards the correct solution. For the higher resolution, we performed 
two runs, one using a single grid of 128'^ cells and one with a top-level grid of only 32^ cells but up 
to two levels of subgrids. Each subgrid also has only 32^ cells; when necessary, multiple adjacent 
subgrids were automatically generated by the code to cover the entire ellipsoid. The number of 
particles is the same in all three runs, about 10^. 

Figure ^ shows the short axis c of the ellipsoid as a function of time t. The solid curve 
represents the analytic solution, the dot-dash curve the solution computed on the coarse grid, 
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while the remaining two dashed curves show the results of the two high-resolution runs. The 
horizontal dotted line corresponds to the grid spacing in the high resolution runs; that of the low 
resolution grid is four times larger. Figure ^ presents a blown-up view of the same results around 
the time of collapse of the ellipsoid to a spindle. 

One sees from these figures that increasing the resolution does indeed yield a better agreement 
between the A^-body and analytic solutions. Furthermore, the results with subgridding are nearly 
identical to those obtained with a single grid of equivalent resolution. In the high-resolution runs, 
the minimum radius of the spindle is comparable to the grid spacing. At the lower resolution, it is 
significantly less than the grid spacing. We did not investigate the reasons for this in detail, but 
it is likely that the convergence is also affected by the number of particles used to represent the 
ellipsoid. In these tests we have kept this number constant. 

The two runs without subgrids also used a small, constant value of the time step 5t = 0.001. 
Collapse occurs after 1524 and 1644 steps respectively. The subgridded run, by contrast, used 
the Courant condition to adjust the time step; only 125 steps were required to reach the point 
of collapse. The good agreement between the high-resolution results obtained with both choices 
of time step confirms the validity of our implementation of adaptive time steps. We have also 
repeated the low-resolution run using our adaptive time step scheme; on figure ^ the results would 
be indistinguishable from those obtained with a fixed time step. 



3.3. Secondary infall across a subgrid boundary 

An interesting test of the behavior of mass flows into a subgrid is the case of accretion from 
a uniform-density background onto a point mass (or any other compact, spherically symmetric 
density profile). A regular lattice of 56^ particles with zero initial peculiar velocity was laid down in 
a computational box of unit side to be evolved with isolated boundary conditions in an = 1, A = 
cosmological model. We added at the center of the box a single particle of mass m = {4:TT/3)pbxf 5i, 
where pb is the mean density of the other particles and of the external background, Xi = 0.1, and 
5i = 1. We followed the collapse to a time tf = 1500tj, corresponding to an expansion factor 
af/a, = {tf/uf/^ = 131. 



A scaling solution is available ( pott 1975 ; Gunn 1977; Fillmore &: Goldreich 1984; Bertschinger 



1985): p{r) oc r~^^^. This simple law is expected to hold only for those shells that enclose a mass 
much larger than that of the initial seed and that have evolved for a few crossing times after their 
initial turnaround. The crossing time for a shell is of the same order as its turnaround time, 
and its proper radius is a factor / ~ 0.5 times the turnaround radius. Using a linear theory 
approximation (in which the density perturbation grows in place) until the turnaround at a mean 
enclosed overdensity (5 ~ 1, we find that for a given initial enclosed overdensity 6i the turnaround 
occurs at a = ai6~^ , the subsequent virialization at a ~ 2ai6~^ , and the final comoving radius 
of the shell is f Xi{ai / a f )6~^ , where Xi is the initial comoving radius. Requiring 5i < 0.1, only 



- 23 - 



shells with Xi > £j((5i/0.1)^/^ = 0.1 x 10^/^ ~ 0.215 are expected to show self-similar behavior, and 
then only for o/aj ^ 20. One can also evaluate Bertschinger's (1985) equation (2.7) with Ri = Xi, 
5i = 5i, T = 1500 (these values lead to rt^{tf) = 0.237a(tj)) and make direct use of his numerically 
determined similarity solution. 

A twist of our simulations is that shells with Xi > 0.5 are prevented from falling in by the fact 
that they are not entirely included within the simulation volume. Accordingly, the accretion will 
be starved for 6i ^ 10~^, and the last complete shell should virialize at a/a, ~ 200. 

Our main reason for performing this test was to compare the solutions obtained with various 
treatments of the border region around a subgrid. We performed four runs: one with a uniform 64'^ 
grid, the other three with a 32^ grid and a nested 32^ subgrid with a linear refinement factor of 2. 
(After subtracting edge cells as discussed in § 2^, the computational box was covered respectively 
by 56^ and 28^ cells.) In one run the border region was suppressed, as suggested by our tests with 
a bound clump crossing the boundary, while the other two both used a border width of two parent 
grid cells, once with the particles in the border region contributing to the subgrid potential without 
feeling its effects and once with the border particles being subjected to the subgrid potential without 
contributing to it. In the last two cases, the net effect of the force asymmetries on overdense regions 
results in an acceleration pointing respectively away from the subgrid and towards it. 

Figure ^ shows the logarithmic density profiles for the runs we just described. The differences 
between the results of all these runs are minute; we find it difficult to ascribe any significance to 
such small deviations. This is reassuring: it suggests that the exact way in which subgrid borders 
are treated does not unduly affect the profiles of collapsed peaks. The dotted line connects open 
triangles that correspond to the values given by Bertschinger (1985) in his Table 4. At small radii, 
his solution tends towards the expected p oc r~^/^, and ours is in good agreement with his. The 
first few caustics can easily be identified. 



3.4. Expanding spherical void with compensating ridge 



Peebles (1987; see also Peebles at al 198£, hereafter PMHJ) has proposed and used the following 
test for cosmological PM codes: integrating the evolution of a spherically symmetric under-dense 
region surrounding by a compensating over-dense shell. The interior of such a void expands faster 
than the universe as a whole, and in so doing compresses the surrounding shell, making its profile 
higher and narrower. The evolution can be computed analytically until the time at which the first 
density cusp appears. For Peebles' profile, 

<5(r) = -5o(^l-^^)e-'->^ (32) 

with 5q = 0.1, this occurs at a/cj ~ 120 in a flat = 1 universe. (Here Oj is the initial value of 
the expansion factor a.) We integrated such a void, realized with 64'^ particles, on a 32^ grid and 
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compared our results to the analytic prediction at ajai = 60. Figure ^ can be directly compared 
to the corresponding figure 4 of PMHJ. A significant difference between our code and theirs lies in 
their use of a staggered grid for the force ( Melott 19861 ). The staggered grid increases the spatial 



resolution twofold (which is why these authors used it) at the cost of inducing self-forces on the 
particles (for which reason we did not follow their example). Because of this difference, our results 
obtained with 64'^ particles on a 64^ grid correspond to their results with the same number of 
particles on a comparison reveals differences of detail, particularly in the structure of 

the outer parts of the shell, but our results match the analytic solution as closely as theirs. 

We take this occasion to illustrate the energy conservation properties of the code. Figure ^ 



shows the evolution with time of the conserved quantities C and C' of equations 27 and ^ (top), 
as well as that of the ratio |AC|/|AT1^| of the changes in C and in the total potential energy W 
from the start of the run (bottom) . Apart from an initial transient due to rounding in the print-out 
from which the plot was made, an examination of the ratio shows the energy to be conserved to 
about 1-2% accuracy throughout the run. 



4. Conclusions 

We have developed a useful and versatile tool for the simulation of collisionless systems of 
gravitating particles. Our code is particularly suited to situations that call for both fine-grained 
mass resolution everywhere and high spatial resolution in selected regions of high density the exact 
locations of which need not be known in advance. Our method is a development of the well- 
known Particle-Mesh technique. Tests indicate that in comparable conditions our code performs 
substantially like similar codes described in the published literature. When subgrids are introduced 
to increase the spatial resolution locally, the results in the regions covered by the subgrids are in 
very good agreement with those of a conventional PM code with a single, finer grid. If the number of 
subgridded regions is sufficiently small, our approach requires less computing time and less memory 
than the equivalent single-grid approach. Furthermore, by not increasing the resolution in regions 
where the particle number density is low, we avoid making the behavior of the code collisional, 
which would be undesirable for applications to collisionless systems. 

A significant advantage of PM (and its variants P^M, AP^M, etc.) for cosmological applications 
is that comoving coordinates and periodic boundary conditions can be supported in a natural way. 
However, periodic boundary conditions are not always a physically appropriate approximation. In 
particular, if one's interest is in systems not much smaller than the size of the computational cube, 
periodic boundary conditions introduce a much stronger coupling between the external tidal field 
and the internal dynamics of the system than one would expect to occur in the aperiodic real uni- 
verse. Periodic boundary conditions are only appropriate when the individual structures of interest 
are much smaller than the computational volume, in which case the tides due to periodic images 
are small; but then the relatively small dynamic range of conventional PM methods severely limits 
the resolution that can be achieved. By introducing hierarchical subgrids, we are able to extend 
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that dynamic range; however, it is also tempting to construct the simulation in such a way that the 
system of interest fills as much of the computational box as possible. This is what led us to imple- 
ment isolated boundary conditions. In principle, the additional cost of imposing isolated boundary 
conditions (which is relatively small when hierarchical grids are used since isolated conditions have 
to be imposed on the subgrids in any case) is offset by the greater freedom one has to apply an 
arbitrary time-dependent external tidal field and to adopt a system of expanding coordinates that 
matches the evolution of the simulated object without necessarily coinciding with the expansion of 
the global cosmological model. Of course our method can also be used in a more conventional way, 
with periodic boundary conditions on the top grid; in fact, numerous simplifications occur when 
this is done. In addition, our code should be well-suited to non-cosmological dynamical simulations, 
e.g., of brief interactions between already formed galaxies, star clusters, and so on. 

Our approach also has a few limitations. The most important is shared by all particle simula- 
tion methods: it is generally impossible to improve the spatial resolution without a corresponding 
refinement of the time resolution: more time steps need to be taken. Other difficulties of note 
are the non-radial nature of forces at small separations (which could be cured, at some cost in 
computing time, by softening the interaction law and increasing the depth of subgridding to com- 
pensate for this softening), the complexity and difficulty of tuning the subgrid placement algorithm, 
and the fact that subgrids of fixed shape will almost inevitably also cover some regions where the 
particle number density is low. The behavior of the code can become coUisional in such regions; 
this should not affect the computed structure of high-density condensations, but may well be a 
source of high-velocity particles that lead to shorter time steps according to the Courant condition. 
An enhancement to our current code could be to refrain from applying the refined forces to such 
low-density cells. Alternatively, it could prove useful to introduce individual time- varying softening 
lengths associated with individual particles, to represent the fact that particles in this method are 
to be thought of as possessing a finite extent that depends on the local number density. Another 
desirable improvement is the adoption of different time steps at different levels of grid refinement; 
the savings that can be achieved from this depend on the detailed structure of the tree of subgrids, 
and will be greatest for deep trees with most of their branches at the finer resolution levels. 
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members of the center's Corporate Partnership Program. We thank Adrian Melott, Richard James, 
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Fig. 1. — Trajectory of a bound clump crossing a fixed subgrid at normal incidence, (dashed curve) 
with and (solid curve) without using a buffer region two cells wide around the subgrid. The figure 
shows the projected position x of the center of mass as a function of time t. The horizontal dotted 
lines show the edges of the subgrid proper; the border region, when used, extends outwards by 0.07 
units on both sides. The use of a buffer region is seen to produce significant unphysical changes in 
the total momentum of the clump. 
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Fig. 2. — Relative error in the acceleration induced by a single massive particle on a number of 
massless test particles at various distances. The massive particle lies at the center of a subgrid 
whose spacing is half that of the top grid. The abscissa is measured in units of the top grid 
spacing; the ordinate is the relative error in the acceleration compared to the Coulomb formula. 
The error only reaches 10% for separations smaller than 1-1.5 cells (depending on the orientation 
of the separation vector); it is less than 4% outside the subgrid, a number that would be lower still 
if larger grid sizes had been used (this test was done with 32^ nodes per grid). The two groups of 
points correspond respectively to test particles inside and outside the subgrid. 
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Fig. 3. — Comparison of the forces from the subgridding procedure with those from an equivalent 
single-grid PM code. The abscissa is the magnitude of the reference force, the ordinate the mag- 
nitude of the force difference (computed as a three-dimensional vector). Each point represents one 
particle; only a random selection of about 2000 particles is shown. The force differences depend 
on the width of the border region for overlap between adjacent subgrids. Panel (a) corresponds 
to a null width, panel (b) to a width of one cell, panel (c) to a width of two cells. The solid 
diagonal straight line shows where the force difference amounts to 10% of the force. In panel (c), a 
dotted line where the force difference is 1% is also plotted. One sees that when the border width 
is sufficiently large, the subgrid forces are substantially equivalent to those that would obtain from 
a single larger grid with the same spacing. 
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Fig. 4. — Minor axis b = c of a collapsing prolate ellipsoid as a function of time. Solid curve: 
analytic solution. Long dashes: low-resolution (32'^ grid) PM simulation. Short dashes: high- 
resolution simulation (128^ grid, no subgrids). Dot-dash curve: simulation with 32^ base grids and 
two levels of subgrids, equivalent to the high-resolution simulation. 
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Fig. 6. — Radial density profile after spherical infall. The decimal logarithm of the density contrast 
5 = p/pb is shown as a function of the logarithm of the comoving radius x, for simulations of a 
sharp peak accreting from a medium of uniform density ph for an expansion factor aj/ai = 131 in 
a critical universe. The results of simulations realized with a single 64^ grid and with a coarser 32^ 
grid and one subgrid of equivalent resolution to the 64^ grid are overlaid. The results are insensitive 
to the details of the treatment of particle flows through the subgrid boundary. The open triangles 
and the dotted line that connects them arc from a table published by Bcrtschingcr (1985), rescaled 
according to his prescriptions. The slope, the normalization, and the locations of the first few 
caustics are in good agreement. 
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Fig. 7. — Result of evolving a spherical hole with a compensating ridge, as in Peebles (1987) and 
Peebles et al. (1989). The density contrast (ratio of the local value to the mean value for the 
cosmological model) is shown as a function of radial distance from the center of the hole. The 
curve is the analytic solution. Points show the mean density in equally spaced radial bins, obtained 
by running our code with a total of 64^ particles and 64"^ grid cells. 
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Fig. 8. — Energy conservation in the expanding void test. The top panel shows the evolution as a 
function of time {t = 3/2(a/aj)^/^) of the conserved quantities C (equation solid curve) and C 
(equation dotted curve). The bottom panel compares the change in C, respectively C, to that 
in W, respectively aW. The initial transient is imputable to the round-off one expects when both 
I AC I and |ATy| are very small. 



